Mini Challenge 1: Wiki Editors
Authors and Affiliations:
Student Team: NO Tool(s): For the VAST competition, the analyses were performed primarily in the Palantir Government platform and to a lesser extent in GoogleEarth and the Palantir Finance platform. Both Palantir platforms are being developed by Palantir Technologies, based in Palo Alto, California. Palantir Technologies was founded in 2004 and works with customers across the Intelligence and Finance Communities. Two Page Summary: YES (will be submitted before 18 Aug)
Wiki-1: What are the factions represented in the edit pages and who are its members? In other words, describe the groups and their members based on their editing changes.
Video link:
Wikipedia edits are a great example of
the increasingly complex style of datasets that we are required to analyze
today: they are extremely large, have a low signal-to-noise ratio, and rely primarily
on human intuition to extract knowledge. In this environment, information is
plentiful and analysis is scarce. Computers are facilitators, not agents, of
analysis. Palantir, therefore, focuses on enabling humans to ask high-level
questions about their datasets and having the computer responsively display the
answers. To start this investigation, we
imported the edits and discussions as events and documents, with each linked to
a username and date. The Palantir Dynamic Ontology allows us to define all of
the objects, events, properties, and links we need to model any data set,
meaning that an end-user can modify Palantir’s ontology without code changes to
accept almost any type of data. Figure 1: Data Import Below
we see all of the edits on the Palantir Graph, with the username of the wiki
users in a histogram on the right (ranked by number of edit performed) and a
timeline of those events: Figure 2: All the
“Wiki Edits” gridded in our Graph Explorer We analyzed the edits in two ways: time centric and user centric. We divided our
team of 4 analysts into 2 ‘cells’ with different approaches. The time centric team noticed periodic spikes
of activity, probably representing controversial edits of a certain segment or
“flame wars” (figure 3). We
decided to zoom in on these segments of the discussion and disregard the more
quiet segments. We lined the edits up sequentially and used Palantir to link them
by related entities, allowing us to easily see who was responsible for which
posts (figure 4). Figure 4: Rm99,
Agustin, and VictoriaV battling it out We
then used the Browser view to examine the details of each post in order to
identify potential factions. Although these were sometimes readily determinable
based on the contents of the post, often the flame wars consisted of little
more than an initial comment and multiple people posting “rv
[revert]—vandalism” in an attempt to restore their favored version of the text
(figure 5). Figure 5: A revert
war from the original file of edits We,
thus, had a visual representation of rivalries (such as that of Rm99 and
VictoriaV), but we sometimes needed more context on the users involved to give
those rivalries meaning. To find this context, we turned to
our second cell, which had been operating simultaneously. Palantir’s
collaboration capabilities allow multiple individuals or teams to freely
manipulate a dataset knowing that they will not corrupt the original version,
as each user operates in their own virtual private repository (“VPR”). At the
same time, any insights that one group desires to share with other
investigators can be easily published to the base repository available to all
analysts (similar to a CVS or SVN model). This team was tasked with an
alternate approach: a user-centric analysis of the edits. By selecting all
edits in the graph, we quickly determined the top users from the Histogram
(figure 6.1). Figure 6.1: The
Histogram Figure 6.2: Just
click and Palantir will find any entities linked to the selected edits Figure 6.3: Top users
linked to their edits and discussions We linked blocks of edits together by user and removed the rest from view (removing-from-graph in Palantir is also non-destructive [only affecting the user’s VPR until changes are published]) (figure 6.2). We then added events of type “Wiki Discussion” to the graph and had Palantir link them to their owner (figure 6.3).The majority of talk page entries were tied to minor players in the discussion and were discarded. Viewing all posts of a given user made the faction divisions mostly clear. Even when we couldn’t tell directly from the content, we synthesized rivalries discovered by team one with known entities from team two to infer the unknown faction allegiance. For example, we read Agustin saying “Catalano’s religion is… a hedonistic religion… of misogyny and greed” during our user-centric analysis and noticed RyogaNica getting in “revert-wars” with Agustin during our time-centric analysis. We also noticed RyogaNica getting in a fight with Edemir (an ambiguous character) and supporting Amado (a figure in favor of the movement). Thus, we can say with high reliability that RyogaNica supports the Paraiso movement. Using this workflow model, we broke the top players (more than five edits) into several factions. The two obvious categories were strong supporters/opponents of the movement. We also found that several editors didn’t fit easily on either side, so we created a neutral category. We further sub-divided the neutral category because several posters made comments on both sides of the issue or were posting fair information in opposition to the movement. We did not want to group a fair attempt to critically evaluate the Paraiso movement with people posting “Paraiso is B******T” or even those showing consistent personal opposition to the religion. VictoriaV (pro) and Rm99 (anti) were the first two people assigned to factions because they were easy to identify. Although VictoriaV hides her partisanship behind Wikipedia rules (“NPOV pushing!”), we quickly learned to see behind this kind of mask. Agustin (anti) was the next assignment based on his discussion posts. The remaining partisans were assigned largely based on their support and opposition of those categorized before them. The anti-vandalism bot was clearly neutral, but we also tossed ambiguous cases (Edemir, 66.66.125.x, Salvatora, Sara, Adriano, and Ricarda) into our neutral bucket. Careful reading of posts led us to believe that Adriano was really focused on grammar and formatting. Salvatora is a moderator, and Ricarda never makes major changes. Edemir, 66.x, and Sara occasionally focused on grammar, but they also delved into content more often. Because they occasionally clashed with both sides but sometimes posted critical content, we labeled them “fair,” meaning they may not support the movement but are not openly biased against it either. Finally, we found all the posts by BakBOT, who automatically reverts mass deletions to see if fringe-radical opponents to the religion had anything interesting to say. We discovered one allegation that the movement had killed several health professionals, but most of the comments held little more than name-calling. The
Palantir platform, thus, transformed a text file of Wikipedia edits and a word
doc of discussions into a rich, interactive investigation that we analyzed relationally
and temporally to determine the factions of the movement.
Wiki-2: Is the Paraiso movement involved in violent activities?
|
|||||||||||||||||